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(54) High speed modular exponentiator 

(57) A modular exponentiator is adapted to receive 
a first communicated signal and derive a second signal 
therefrom by computation of a modular exponentiation 
of the form 6* mod n based on the first signal. The mod- 
ular exponentiator divides the modular exponentiation 
according to the Chinese remainder theorem into first 
and second portions respectively having modulus val- 
ues p and q of approximately half of an original modulus 



value n of the modular exponentiation. Each portion of 
the modular exponentiation is factored into respective 
pluralities of smaller modular exponentiations having 
precalculated exponent values. The respective plurali- 
ties of smaller modular exponentiations are then multi- 
plied together to provide respective intermediate prod- 
ucts. The intermediate products are then recombined to 
yield the modular exponentiation result. 
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Description 

BACKGROUND OF THE INVENTION 
5 1. Field of the Invention 

n*„,^ r P , reSen i inVen,i0n r °!T* ! ° cr yP to 9 ra P hic s y stems - and ™*° particularly, to a highly efficient modular expo- 
nents for performing modular reduction operations integral to cryptographic key calculations. 

10 2. Description of Related Art 

Cryptographic systems are commonly used to restrict unauthorized access to messages communicated over oth- 
erwise rnsecure channels. In general, cryptographic sysfems use a unique key. such as a series of numbers, to control 
an algorithm used to encrypt a message before it is transmitted over an insecure communication channel to a receiver 
The receiver must have access to the same key in order to decode the encrypted message. Thus, it is essential that 
^° mmu f n,cated ,n advance b V the "nder to the receiver over a secure channel in order to maintain the 
security of the cryptographic system; however, secure communication of the key is hampered by the unavailability and 
expense of secure communication channels. Moreover, the spontaneity of most business communications is impeded 
by the need to communicate the key in advance. 

In view of the difficulty and inconvenience of communicating the key over a secure channel, soiled public key 
cryptographic systems are proposed in which a key may be communicated over an insecure channel without jeopard- 
izing the security of the system. A public key cryptographic system utilizes a pair of keys in which one is publicly 
communicated, i.e. the public key. and the other is kept secret by the receiver, i.e.. the private key. While the private 
key ,s mathematically related to the public key. it is practically impossible to derive the private key from the public key 
alone. In this way, the public key is used to encrypt a message, and the private key is used to decrypt the message 
Such cryptographic systems often require computation of modular exponentiations of the f orm y = b° mod n in 

» , eX , P °?! nt 6 a " d m0dU ' US " 3re eX,femely ,arge numbers - e "9- havin 9 a le "9th of 1,024 binary digits 
or bits. If, for example, the exponent e were transmitted as a public key, and the base b and modulus n were known 

«»1 1 '? 1 ' 3 k6y V C0U ' d be defiVed by com P u,i "9 the modular exponentiation. It would require 

such a extremely large amount of computing power and time to factor the private key y from the exponent e without 
knowledge of the base b and modulus n, that unauthorized access to the decrypted message is virtually precluded as 
3 practical matter. 

A drawback of such cryptographic systems is that calculation of the modular exponentiation remains a daunting 
mathematical task even to an authorized receK/er using a high speed computer. With the prevalence of public computer 
networks used to transmit confidential data for personal, business and governmental purposes, it is anticipated that 
most computer users will want cryptographic systems to control access to their data. Despite the increased security 
he difficulty of the modular exponentiation calculation will substantially drain computer resources and degrade data 
syZms ratGS ' thUS fepreSentS 3 mai ° r im P ediment to the widespread adoption of commercial cryptographic 
Accordingly a critical need exists for a high speed modular exponentiation method and apparatus to provide a 
m^hSti^e? COmmUniCali ° n SGCUrity W ™ 6 minimi2, ' ng the im P act to com P uter system performance and data 

SUMMARY OF THE INVENTION 

In accordance with the teachings of the present invention, a highly efficient modular exponential for a crypto- 
graphic system is provided. yH 
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The cryptographic system comprises a processing unit having a modular exponentiator that is adapted to receive 
a first communicated signal and derive a second signal therefrom by computation of a modular exponentiation of the 
form be mod n based on the first signal. The modular exponentiator divides the modular exponentiation into first and 
second portions respectively having a modulus value of approximately half of an original modulus value n of the modular 
exponent. a .on. Each portion of the modular exponentiation is factored into respective pluralities of smaller modular 
exponentiations hav.ng precalculated exponent values. The respective pluralities of smaller modular exponentiations 
are then calculated and recombined together. 

Particularly, the modular exponentiation is divided into portions in accordance with the Chinese remainder theorem 
in which the first portion of the modular exponentiation comprises a component *>p mod p and the second portion of 
Ula ^ Xp0nen,,a,lon ™»P'™s a component J* mod o in which p and q are prime numbers having a product 
equal to n. The exponents e p and e q of the modular exponentiation portions relate to the original exponent e as emod 
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(p-1 ) and emod ), respectively. The exponents e p and e q are factored by use of an exponent bit-scanning technique 
in which the exponent is loaded into a register having a shiftable window of a size substantially less than the corre- 
sponding size of the register. In the exponent bit-scanning technique, a pre-computed value is selected which comprises 
the base b raised to a selected power equivalent to a portion of the exponent e read through the window The pre- 
s computed value is squared a number of times corresponding to successive bit shifts of the window relative to the 
exponent e. The multiplying operation utilizes a Montgomery multiplication process having a speed-up routine for squar- 
ing operations. 

A more complete understanding of the high speed modular exponentiator will be afforded to those skilled in the 
art, as well as a realization of additional advantages and objects thereof, by a consideration of the following detailed 
io description of the preferred embodiment. Reference will be made to the appended sheets of drawings which will first 
be described briefly. 

BRIEF DESCRIPTION OF THE DRAWINGS 

is Fig. 1 is a block diagram of an exemplary application of a modular exponentiator within a cryptographic system; 

Fig. 2 is a block diagram of the modular exponentiator; 

Fig. 3 is a system level flow diagram of the functions performed by the modular exponentiator; 
Fig. 4 is a flow chart showing an exponent bit scanning operation performed by the modular exponentiator; 
Fig. 5a-c are block diagrams of an exponent register within various stages of the exponent bit scanning operation 
20 of Fig. 4; 

Fig. 6 is a flow chart showing a multiplication operation performed by the modular exponentiator; 

Fig. 7 is a flow chart showing a squaring operation performed in conjunction with the multiplication operation of 

Fig. 6; 

Fig. 8 is a chart showing an exemplary exponent bit scanning operation in accordance with the flow chart of Fig 
25 4; and 

Fig. 9 is a chart showing an exemplary multiplication and squaring operation in accordance with the flow charts of 
Figs. 6 and 7. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

30 

The present invention satisfies the need for a high speed modular exponentiation method and apparatus which 
provides a sufficient level of communication security while minimizing the impact to computer system performance and 
data throughput rates. In the detailed description that follows, like element numerals are used to describe like elements 
in one or more of the figures. 

35 Referring first to Fig. 1, a block diagram of an application of a modular exponentiator 20 within an exemplary 

cryptographic system 10 is illustrated. The exemplary cryptographic system 1 0 includes a central processing unit (CPU) 
12, a random access memory (RAM) 14, a read only memory (ROM) 16, and modular exponentiator 20. Each of the 
elements of the cryptographic system 10 are coupled together by a bi-directional data and control bus 18, over which 
data and control messages are transmitted. The CPU 12 controls the operation of the cryptographic system 10, and 

40 may be provided by a conventional microprocessor or digital signal processor circuit. The RAM 1 4 provides temporary 
data storage for operation of the CPU 12, and fhe ROM 16 provides for non-volatile storage of an instruction set, i.e., 
software, that is executed in a sequential manner by the CPU 12 to control the overall operation of the cryptographic 
system 10. The modular exponentiator 20 may comprise a special function device, such as an application specific 
integrated circuit (ASIC) or field programmable gate array (FPGA), that is accessed by the CPU 1 2 to perform modular 

45 exponentiation operations. Alternatively, the elements of the cryptographic system 10 may all be contained within a 
single ASIC or FPGA in which the modular exponentiator 20 is provided as an embedded core process. 

As known in the art, the cryptographic system provides an interface between a non-secure communication channel 
and a data user. The cryptographic system receives encrypted data from an external source, such as a remote trans- 
mitter (not shown) which is communicating with the cryptographic system over the communication channel. The en- 

50 crypted data is decrypted by the cryptographic system, and the decrypted data is provided to the data user. Conversely, 
the data user provides decrypted data to the cryptographic system for encryption and subsequent transmission across 
the communication channel. The cryptographic system also receives and transmits various non-encrypted messages, 
such as control data and the public key information. It should be apparent that all communications with the cryptographic 
system occur via the data and control bus 18. 

55 The modular exponentiator 20 is illustrated in greater detail in Fig. 2. The modular exponentiator 20 comprises an 

interface logic unit 22, a pair of parallel processing units 24a. 24b, and a RAM 25, which all communicate internally 
over a data and control bus 27. The interface logic unit 22 controls communications between the modular exponentiator 
20 and the data and control bus 18 of the cryptographic system 10 described above. The processing units 24a, 24b 
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comprise respective control units 26a, 26b and multiplier units 28a, 28b, which furthercomprise internal circuit elements 
that execute a modular exponentiation process, as will be further described below. The RAM 25 provides for temporary 
storage of data values generated by the control units 26a, 26b and multiplier units 28a. 28b while executing a modular 
exponentiation operation. a 

Referring now to Fig. 3 in conjunction with Fig. 2 described above, a system level flow diagram of the functions 
performed by the modular exponentiator 20 is illustrated. As shown at step 101. the modular exponentiator 20 will 
compute a modular exponentiation of the form y = b° mod n. in which the modulus n, base b and exponent e are each 
k bits long. In a preferred embodiment of the present invention, k is 1,024 bits. Using conventional methods solving 
such a modular exponentiation would require a tremendous amount of computing power due to the large number and 
size of the multiplications and modular reductions that must be performed. In the present invention, the modular ex- 
ponent.at.on .s solved in a highly efficient manner by reducing the size of the problem and by reducing the number of 
multiplications that are performed. 

As a first step in solving the modular exponentiation, the original exponentiation is split into components, as follows: 

b° mod n = (((g" 1 mod p * (*>/ p mod p + p - b/'mod q)) mod p)* q) + b/'mod q 

in which p and q are large prime numbers whereby n = p * q. For maximum security, p and q should be roughly the 
same size. The term q-i mod p is a special value called an inverse which is derived from the Chinese remainder 
theorem, as known in the art. In particular, q-i mod p is the inverse of q mod p. Since the inverse represents a modular 
exponentiation of the same order as b° P mod p. the inverse may be pre-calculated in advance, and stored in the RAM 
25 at step 108. The values e p and e q are k/2 bit values equal to emod (p-1) and e mod (q-1), respectively. A reduced 
base term b r for each of b°P mod p and b»o mod q is provided by taking a modular reduction of b with respect to p and 
q, respectively. The reduced base terms b r thus have a k/2 bit length as well. 

Splitting the modular exponentiation permits its solution in two parallel paths, as illustrated in Fig 3 which are 
processed separately by the respective processing units 24a, 24b of Fig. 2. At steps 104, 105, the modular exponen- 
tiations b °p mod p and b° q mod q are calculated separately using techniques that will be further described below. The 
b r terms of each of the two modular exponentiations may be pre-calculated in advance, and stored in the RAM 25 at 
steps 102, 103. 

Since p and q are each respectively k/2 bits in length, the magnitude of the respective problems is thus reduced 
substantially from its original form. Moreover, the parallel calculation of two reduced-size modular exponentiations 
requires substantially less computer processing time than a corresponding calculation of the original modular expo- 
nentiation within a single processing unit. The reduction in processing time results from the fact that the number of 
multiplies needed to perform an exponentiation with an efficient algorithm (such as described below) is proportional to 
2^+s, where s .s equal to k divided by the multiplication operand size in bits. If an s word problem was treated as two 
separate s/2 word problems, the number of multiply operations per exponentiation is reduced to a value proportional 
to i" + 2 For example, if k were 1,024 bits and the multiplication operand were 128 bits, s would be equal to 8 
Accordingly, an s word problem would require a number of multiply operations proportional to 1 36, while the two sep- 
arate s/2 word problems would respectively require anumberof multiply operations proportional to 36 Thus the number 
of multiply operations is reduced by 3.778 times. 

Following the calculations of steps 1 04, 105, the b,<* mod q term is subtracted from bfi> mod p. and the result is 
added to p at step 106. At step 107, the resulting sum is multiplied by the inverse q-1 mod p which was pre-calculated 
at step 108. This step may be performed by one of the multipliers 28a, 28b, which are optimized for modular operations 
as will be further described below. The resulting product is modularly reduced with respect to p at step 109 and further 
multiplied by q at step 1 1 0 to produce a k-bit value. Lastly, the product of that final multiplication is added' to b «M mod 
q at step 111, which was previously calculated at step 105. It should be appreciated that the modular reduction that 
occurs at step 109 is much easier than the original modular exponentiation in view of the substantial reduction in size 
of the original be term. This final solution to the modular exponentiation is provided to the data and control bus 18 for 
further use by the CPU 12. 

Referring now to Figs. 4 and 5a-c, the modular exponentiations of b°o mod p and 6 r «* mod q from steps 1 04 1 05 
of Fig. 3 are shown in greater detail. Specifically. Fig. 4 illustrates a flow chart describing a routine referred to herein 
as exponent bit-scanning, which is used to reduce the number of multiplications necessary to perform an exponenti- 
ation. In general, the exponent bit-scanning routine factors the exponentials V'and b/o into a product of precomputed 
powers of the reduced base b r modularly reduced with respect to p or q. The routine may be coded in firmware and 
executed sequent.ally by the respective processing units 24a. 24b described above in the form of a software program 
Alternatively, the routine may be hardwired as discrete logic circuits that are optimized to perform the various functions 
of the exponent bit-scanning routine. For convenience, the description that follows will refer only to the operation ot 
the exponent bit scanning routine with respect to the exponential b r "». but it should be appreciated that a similar oper- 
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ation must be performed with respect to the exponential b*v. 

The exponent bit-scanning routine is called at step 200, and a running total is initialized to one at step 201. An 
exponent to be bit-scanned is loaded into a register at step 202. Figs. 5a-c illustrate a k-brt exponent e (i.e., e k . r 
e 0 ) loaded into a register 32. The register 32 may comprise a predefined memory space within the RAM 25. First, a 
window 34 is defined through which a limited number of bits of the exponent e are accessed. A window size of three 
bits is used in an exemplary embodiment of the present invention, though it should be appreciated that a different 
number could also be advantageously utilized. The window 34 is shifted from the left of the register 32 until a one 
appears in the most significant bit (MSB) of the 3-bit window, as shown by a loop defined at steps 203 and 204. In step 
203, the MSB is checked for presence of a one, and if a one is not detected, the window 34 is shifted by one bit to the 
right at step 204. Fig. 5b illustrates the window 34 shifted one bit to the right. It should be apparent that steps 203 and 
204 will be repeated until a one is detected. 

At step 205, a one has been detected at the MSB, and the value of the three-bit binary number in the window 34 
is read. The number is necessarily a 4, 5, 6 or 7 (i.e. , binary 1 00, 1 01 , 1 1 0 or 1 1 1 , respectively) since the MSB is one. 
At step 206, a pre-computed value for the reduced base b r raised to the number read from the window 34 (i.e., 6 r 4 , 
b r s , b, 6 or b/ t respectively) is fetched from memory. This pre-computed value is multiplied by a running total of the 
exponentiation at step 207. It should be appreciated that in the first pass through the routine the running total is set to 
one as a default. 

Thereafter, a loop begins at step 209 in which the register 32 is checked to see if the least significant bit (LSB) of 
the exponent e p has entered the window 34. Significantly, step 209 checks for the LSB of the entire exponent e in 
contrast with step 203 which reads the MSB of the window 34. If the LSB has not yet entered the window 34, the loop 
continues to step 212 at which the window 34 is successively shifted to the right, and step 213 in which the running 
total is modular squared with each such shift. The loop is repeated three times until the previous three bits are no 
longer in the window 34, i.e., three shifts of the window. Once three shifts have occurred, the routine determines at 
step 216 whether the MSB is one. If so, the routine returns to step 205, and the value in the window 34 is read once 
again. Alternatively, if the MSB is zero, then the register 32 is again checked at step 217 to see if the LSB of the 
exponent e p has entered the window 34. If the LSB is not in the window 34, the loop including steps 212 and 213 is 
again repeated with the window again shifted one bit to the right and the running total modular squared with the shift. 

If, at step 217, the LSB has entered the window 34, this indicates that the end of the exponent e p has been reached 
and the exponent bit-scanning routine is almost completed. At step 222, the last two bits in the window 34 are read, 
and at step 223 the running total is multiplied by the reduced base o r the number of times the value read in the window. 
For example, if the value of the lower two bits is a one, two, or three (i.e., binary 01, 10 or 11, respectively), then the 
previous running total is multiplied by the reduced base 6 r one, two or three times, respectively. If the value of the lower 
two bits is a 0, then the running total is not changed (i.e, multiplied by one). Then, the exponent bit-scanning routine 
ends at step 224. 

Returning to step 209 discussed above, before the loop begins, the register 32 is checked to see if the LSB of the 
exponent e p has entered the window 34. If the LSB has entered the window 34, a series of step are performed in which 
the count value is checked. The count value keeps track of the number of passes through the above-described loop 
that have taken place. If the count value is three, indicating that all of the bits in the window 34 have been previously 
scanned, then the exponent bit-scanning routine ends at step 224. If the count value is two, then all but the last bit in 
the window 34 has been previously scanned, and at step 221 , the value of the last bit is read. If the count value is one, 
then only the first bit in the window 34 has been previously scanned, and at step 222, the value of the last two bits is 
read (as already described above). Once again, at step 223 the running total is multiplied by the reduced base 6 r the 
number of times the value read in the window Then, the exponent bit-scanning routine ends at step 224. 

An example of the exponent bit-scanning technique is illustrated in Fig. 8 with respect to a modular exponentiation 
of abase b raised to a ten-bit exponent e, in which e- 1011010011. The successive shifts reduce the exemplary term 
6 ionoiooii to ((((((((b5)2)2 ) 2 )+6 5 ) 2 ) 2 ) 2 ) 2* /7 3 > Sjnce the tenrn & was pre calculated and fetched from memory, processing 
time is saved by not having to calculate that term. In addition, there are additional processing time savings that are 
achieved in performing a modular reduction of the exemplary term with respect to n due to the distributive nature of 
modular reduction. Rather than a huge number of multiplications followed by an equally huge modular reduction, only 
nine multiplications and modular reductions are required, and the modular reductions are smaller in magnitude since 
the intermediate values are smaller. 

It should be appreciated that the modular squaring step that occurs with each shift is necessary since the exponent 
bit-scanning begins at the MSB of the exponent e p where the window value is not really 4, 5, 6 or 7, but is actually 4, 
5, 6 or 7 times 2 k where k is the exponent bit position for the window's LSB bit. Since the value of the exponent e is 
interpreted as a power of the base b r a factor of 2 k implies squaring k times. Multiplying by a precalculated value when 
the window MSB is one is used to insure that all ones in the exponent e p are taken into account and to reduce the total 
number of pre -calculated values that are needed. 

Even though the exponent bit -scanning routine has reduced the number of multiplications that have to be performed 
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in the respective calculations of *,* mod p and b**mo6 q, there still are a number of multiplications that need to be 
performed. The modular exponentiator 20 utilizes an efficient multiplication algorithm for modular terms referred to in 
the art as Montgomery multiplication. The Montgomery algorithm provides that: 

Mont (a, 6) = ^-^modn 
2* 

where k is the number of bits in the modulus n, n is relatively prime to 2* and n>a, n>b. In order to use the algorithm 
for repeated multiplies, the values of a and b must be put into Montgomery form prior to performing the Montgomery 
multiply, where: 

x*2 k modn = x Monl 

If the two values to be Montgomery multiplied are in Montgomery form, then the result will also be in Montgomery form 
Fig. 6 illustrates a flow chart describing a Montgomery multiplication operation executed by the modular exponen- 
tiator 20. As with the exponent bit-scanning routine described above with respect to Fig. 4, the Montgomery multipli- 
cation operation may be coded in firmware and executed sequentially within the respective processing units 24a 24b 
by the control units 26a, 26b which access the multipliers 28a, 28b for particular aspects of the operation as will be 
further descnbed below. Alternatively, the Montgomery multiplication routine may be hardwired as discrete logic circuits 
that are optimized to perform the various functions of the routine. 

As illustrated in Fig. 6. the Montgomery multiplication routine includes a major loop and two minor loops In each 
major loop, a distinct word of a multiplicand b, is multiplied by each of the words of a multiplicand a where j is the 
number of words in multiplicand a, and i is the number of words in multiplicand b. The Montgomery multiplication routine 
is called at step 301. The two multiplicands a, and t>, are loaded into respective registers at step 302 along with a 
square flag. If the two multiplicands ^ and b, are equal, the square flag is set to one so that a squaring speed-up 
subroutine may be called at step 400. The squaring speed-up subroutine will be described in greater detail below. If 
the two multiplicands aj and b,- are not equal, then the square flag is set to zero. 

Before initiating the first major loop, i is set to be equal to one at step 305 so that the first word of multiplicand b 
.s accessed. The square flag is checked at step 306 to determine whether the squaring speed-up subroutine should 
be called, and if not, j is set equal to one at step 307. The two words aj and b, are multiplied together within the first 
minor loop at step 308, and the product added to the previous carry and previous c=. It should be appreciated that in 
the first pass through the routine, the carry and c, values are zero. The lower word of the result is stored as c and the 
higher word of the result is used as the next carry. The first minor loop is repeated by incrementing j at step 310 until 
the last word of a, .s detected at step 309, which ends the first minor loop. Before starting the second minor loop a 
special reduction value is calculated that produces all '0-s for the lowest word of a when muftiplied with c f and j is set 
to two at step 311. Thereafter, at step 312, the special reduction value is multiplied by the modulus n,, and added to 
the prev.ous carry and Cj . The lower word of the result is stored as and the higher word of the result is used as the 
next carry. The second minor loop is repeated by incrementing j at step 31 4 until the last word of c= is detected at step 
31 3, which ends the second minor loop. Once the second minor loop ends, i is incremented at step 31 6 and the major 
loop is repeated until the last word of b, has 'passed through the major loop. Then, the modular reduclion of the final 
result of Cj with respect to n is obtained at step 317, and the Montgomery multiplication routine ends at step 318 An 
example of a Montgomery multiplication of ^ with b, in which both multiplicands are four words long is provided at Fig 
9. In the example, the symbol 2 is used to denote the combination of all previous values. 

The Montgomery multiplication routine of Fig. 6 can be speeded up when used to square a number by recognizing 
that some of the partial products of the multiplication are equal. In particular, when multiplicand a is equal to multiplicand 
bj, i.e., a squaring operation, then the partial products of various components of the multiplication would ordinarily be 
repeated, e.g., the partial product of a 2 with b 3 is equal to the partial product of a 3 with b 2 . As illustrated in Fig 9 both 
of these partial products occur during the third major loop iteration. Thus, the first time the partial product is encountered 
it can be multiplied by two to account for the second occurrence, and a full multiplication of the second partial product 
can be skipped. Multiplication by two constitutes a single left shift for a binary number, and is significantly faster than 
a full multiplication operation. It should be appreciated that a great number of squaring operations are performed by 
the modular exponentiator 20 due to the operation of the exponent bit-scanning routing described above and an in- 
crease in speed of the squaring operations would have a significant effect on the overall processing time for a particular 
modular exponentiation. 

Fig. 7 illustrates a flow chart describing the squaring speed-up subroutine, which is called at step 401 Initially, j is 
set to be equal to , at step 402, which, in the first iteration of the major loop of Fig. 6. will be equal to one. In subsequent 
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iterations of the major loop, however, it should be apparent that j will begin with the latest value of i and will thus skip 
formation of partial products that have already been encountered. At step 403, i is compared to j. If i is equal to j, then 
at step 405 a factor is set to one, and if i and j are not equal, then at step 404 the factor is set to two. Thereafter, in 
step 406, aj and b { and the factor are multiplied together and the product added to the prevbus carry and Cj. As in step 
308 of Fig. 6, the lower word of the result is stored as Cj and the higher word of the result is used as the next carry. 
After completing the multiplication step 406, j is incremented at step 408 and the loop is repeated until the last word 
of bj has passed through the loop, at which time the squaring speed-up subroutine ends at step 409. At step 410 of 
Fig. 6, the Montgomery multiplication routine resumes just after the first minor loop. It should be appreciated that the 
squaring speed-up subroutine will operate in place of the first minor loop for every iteration of the major loop of the 
Montgomery multiplication routine when the squaring flag is set. 

In order to perform the Montgomery multiplication routine more efficiently, the multipliers 28a, 28b are tailored to 
perform specific operations. In particular, the multipliers 28a, 28b include specific functions for multiplying by two (used 
by the squaring speed-up routine), executing an a*b+c function, and performing the mod 2P function on a 2n-bit result 
while leaving the higher n bits in a carry register. The implementation of such tailored functionality for a multiplier device 
is considered to be well known in the art. 

Having thus described a preferred embodiment of a high speed modular exponent iator, it should be apparent to 
those skilled in the art that certain advantages of the within system have been achieved. It should also be appreciated 
that various modifications, adaptations, and alternative embodiments thereof may be made within the scope and spirit 
of the present invention. The invention is further defined by the following claims. 



Claims 

1. A method for computing a modular exponentiatbn of the form b° mod n comprises the steps of: 

dividing said modular exponentiation into at least first and second portions respectively having reduced mod- 
ulus values p and q, and a reduced base value b n than an original modulus value n and base value b of said 
modular exponentiation; 

factoring each portion of said modular exponentiation into respective pluralities of smaller modular exponen- 
tiations having precalculated exponent values; 

multiplying said respective pluralities of smaller modular exponentiations together to provide respective first 
level intermediate products, said multiplying step further comprising selectively reducing a total number of 
operations of said multiplying step for squaring of said respective pluralities of smaller modular exponentia- 
tions; 

subtracting said respective intermediate products from each other and adding one of said reduced modulus 
values to provide an intermediate sum value; 

multiplying said intermediate sum value with a precalculated inverse value to provide a second level interme- 
diate product; 

calculating a modular reduction of said second level intermediate product with respect to one of said reduced 
modulus values; 

multiplying said modular reduction of said second level intermediate product with the other one of said reduced 
modulus values; and 

adding one of said first and second modular exponentiation portions to said second level intermediate product. 

2. A method according to Claim 1 , wherein said first portion of said modular exponentiation comprises a component 
b*p mod p and said second portion of said modular exponentiation comprises a component bfq mod q in which 
p and q are prime numbers having a product equal to n. 

3. A method according to Claim 2, wherein p and q are of approximately equivalent magnitude. 

4. A method according to Claim 2, wherein said reduced base valued, of said component b r e p mod p is pre-calculated 
from b mod p. 

5. A method according to Claim 2, wherein said reduced base valued, of said component b*q mod q is pre-calculated 
from b mod q. 

6. A method according to any one of Claims 1 to 5, wherein said precalculated inverse value is of the form o~ 1 mod p. 
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7. A method according to any one of Claims 1 to 6, wherein said factoring step further comprises executing an ex- 
ponent bit-scanning process. 

8. A method according to any one of Claims 1 to 6, wherein said factoring step further comprises loading an exponent 
e of one of said portions of said modular exponentiation into a register, said register further comprising a shiftable 
window of a size substantially less than a corresponding size of said register. 

9. A method according to Claim 8, wherein said factoring step further comprises selecting a pre<omputed value 
comprising b raised to a selected exponent equivalent to a portion of exponent e read through said window 

1 0. A method according to Claim 9, wherein said factoring step further comprises modular squaring said pre^omputed 
value a number of times corresponding to successive bit shifts of said window relative to said exponent e. 

11. A method according to any one of Claims 1 to 10, wherein said multiplying step further comprises executing a 
15 Montgomery multiplication process. 

12. A method according to any one of Claims 1 to 10, wherein said multiplying step further comprises determining 
whether a particular multiplication of said respective pluralities of smaller modular exponentiations further com- 
prises a squaring operation. 

20 

1 3. A method according to Claim 12, wherein said multiplying step further comprises selectively reducing a total number 
of partial products of said particular multiplication when said multiplication comprises a squaring operation. 
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